Skip to content

feat: support for Spark 4 #589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

feat: support for Spark 4 #589

wants to merge 11 commits into from

Conversation

razvan
Copy link
Member

@razvan razvan commented Jul 9, 2025

Description

Part of: #586

Depends on the corresponding image pr stackabletech/docker-images#1216

Spark 4 is considered experimental because of the following issues:

The integration tests have been updated to exclude spark 4 for the tests known to cause problems.

Definition of Done Checklist

  • Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
  • Please make sure all these things are done and tick the boxes

Author

  • Changes are OpenShift compatible
  • CRD changes approved
  • CRD documentation for all fields, following the style guide.
  • Helm chart can be installed and deployed operator works
  • Integration tests passed (for non trivial changes)
  • Changes need to be "offline" compatible
  • Links to generated (nightly) docs added
  • Release note snippet added

Reviewer

  • Code contains useful comments
  • Code contains useful logging statements
  • (Integration-)Test cases added
  • Documentation added or updated. Follows the style guide.
  • Changelog updated
  • Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

  • Feature Tracker has been updated
  • Proper release label has been added
  • Links to generated (nightly) docs added
  • Release note snippet added
  • Add type/deprecation label & add to the deprecation schedule
  • Add type/experimental label & add to the experimental features tracker

@razvan razvan self-assigned this Jul 9, 2025
razvan and others added 6 commits July 10, 2025 15:33
* feat(helm): Add RBAC rule for automatic cluster domain detection

* chore: Bump stackable-operator to 0.94.0 and update other dependencies

* chore: Update changelog

* chore: Add sparkhistory and shs shortnames
@razvan razvan mentioned this pull request Jul 23, 2025
6 tasks
@razvan razvan marked this pull request as ready for review July 23, 2025 15:43
Copy link
Member

@adwk67 adwk67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor text stuff. Tests to come.
A CHANGELOG entry is missing - not sure if it is needed as this is basically all test & doc changes.


=== Maven packages

The last and most flexible way to provision dependencies is to use the built-in `spark-submit` support for Maven package coordinates.
The downside of this method is that job dependencies are downloaded every time the job is submitted and this has several implications you must be aware of.
For example, the job submission time will be longer than with the other methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For example, the job submission time will be longer than with the other methods
For example, the job submission time will be longer than with the other methods.

The downside of this method is that job dependencies are downloaded every time the job is submitted and this has several implications you must be aware of.
For example, the job submission time will be longer than with the other methods
Network connectivity problems may lead to job submission failures.
And finally, not all type of dependencies can be provisioned this way. Most notably, JDBC drivers cannot be provisioned this way since the JVM will only look for them at startup time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
And finally, not all type of dependencies can be provisioned this way. Most notably, JDBC drivers cannot be provisioned this way since the JVM will only look for them at startup time.
And finally, not all type of dependencies can be provisioned this way.
Most notably, JDBC drivers cannot be provisioned this way since the JVM will only look for them at startup time.

If you need access to JDBC sources from your Spark application, consider building your own custom Spark image as shown above.
As mentioned above, not all dependencies can be provisioned this way.
JDBC drivers are notorious for not being supported by this method but other types of dependencies may also not work.
If a jar file can be provisioned using it's Maven coordinates or not, depends a lot on the way it is loaded by the JVM.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If a jar file can be provisioned using it's Maven coordinates or not, depends a lot on the way it is loaded by the JVM.
If a jar file can be provisioned using its Maven coordinates or not, depends a lot on the way it is loaded by the JVM.

@@ -3,5 +3,6 @@
// Stackable Platform documentation.
// Please sort the versions in descending order (newest first)

- 4.0.0 (Hadoop 3.4.1, Scala 2.13, Python 3.11, Java 17) (Experimental)
- 3.5.5 (Hadoop 3.3.4, Scala 2.12, Python 3.11, Java 17) (Deprecated)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- 3.5.5 (Hadoop 3.3.4, Scala 2.12, Python 3.11, Java 17) (Deprecated)
- 3.5.6 (Hadoop 3.3.4, Scala 2.12, Python 3.11, Java 17) (LTS)

@@ -3,5 +3,6 @@
// Stackable Platform documentation.
// Please sort the versions in descending order (newest first)

- 4.0.0 (Hadoop 3.4.1, Scala 2.13, Python 3.11, Java 17) (Experimental)
- 3.5.5 (Hadoop 3.3.4, Scala 2.12, Python 3.11, Java 17) (Deprecated)
- 3.5.6 (Hadoop 3.3.4, Scala 2.12, Python 3.11, Java 17) (LTS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- 3.5.6 (Hadoop 3.3.4, Scala 2.12, Python 3.11, Java 17) (LTS)
- 3.5.5 (Hadoop 3.3.4, Scala 2.12, Python 3.11, Java 17) (Deprecated)

@adwk67 adwk67 moved this to Development: In Review in Stackable Engineering Jul 28, 2025
@adwk67
Copy link
Member

adwk67 commented Jul 29, 2025

🟢 Local tests (nightly suite) are all good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Development: In Review
Development

Successfully merging this pull request may close these issues.

4 participants